ggml : add SSM Metal kernels #8546

ggerganov · 2024-07-17T18:37:42Z

Straightforward Metal implementation of SSM_CONV and SSM_SCAN using single-threaded kernels, mimicking the CPU implementation. Lot's of room for further optimizations, for now assuring correctness

./llama-batched \
  -m ./models/mamba-130m/ggml-model-f16.gguf \
  -p "Hello, my name is" -np 16 -n 32

main: n_predict = 32, n_ctx = 448, n_batch = 32, n_parallel = 16, n_kv_req = 437

Hello, my name is

main: generating 16 sequences ...

main: stream 0 finished at n_cur = 32
main: stream 1 finished at n_cur = 32
main: stream 2 finished at n_cur = 32
main: stream 3 finished at n_cur = 32
main: stream 4 finished at n_cur = 32
main: stream 5 finished at n_cur = 32
main: stream 6 finished at n_cur = 32
main: stream 7 finished at n_cur = 32
main: stream 8 finished at n_cur = 32
main: stream 9 finished at n_cur = 32
main: stream 10 finished at n_cur = 32
main: stream 11 finished at n_cur = 32
main: stream 12 finished at n_cur = 32
main: stream 13 finished at n_cur = 32
main: stream 14 finished at n_cur = 32
main: stream 15 finished at n_cur = 32

sequence 0:

Hello, my name is Tiffany. I'm a mother of three and a retired teacher. I'm a member of the American Indian and Alaska Native (AI

sequence 1:

Hello, my name is John. I am a freelance writer and editor. I have a passion for writing and have been writing since I was a child. I

sequence 2:

Hello, my name is Renee. I'm a full-time writer, and I'm currently working on a new book. I'm also a graduate

sequence 3:

Hello, my name is Jules. I'm a writer and illustrator. I have a passion for the arts and I love to travel. I love to

sequence 4:

Hello, my name is Renee. I am a single mom of two boys. I am trying to figure out how to make this work. I am

sequence 5:

Hello, my name is Dr. Sonia. I'm a doctor in the University of Medicine and Dentistry of New Jersey. I'm here to help you

sequence 6:

Hello, my name is Nick. I'm a member of the
  National Association of Women in the United States of America. I'm
  a member

sequence 7:

Hello, my name is Jadine. I'm a real person, and I'm here to help you. I'm here to help you get the best

sequence 8:

Hello, my name is Roxane and I'm a young woman with a love of all things chocolate. I've been a member of the Chocolate Club for

sequence 9:

Hello, my name is John. I'm a professional musician, and I'm looking for a new job. I'm a musician, and I'm looking for

sequence 10:

Hello, my name is Dr. Paul, and I'm a doctor in the area of cardiac surgery. I'm here to help you. I'm here to

sequence 11:

Hello, my name is Daniel and I'm a teacher in an elementary school in the United States. I've been reading about the dangers of the internet for the

sequence 12:

Hello, my name is Sven, and I'm a member of the Sven-Gustavsson Foundation. I'm here to talk about the future

sequence 13:

Hello, my name is Nico, I'm a professional photographer, I work in the studio of the famous photographer, Josef Krammer, who is

sequence 14:

Hello, my name is John. I'm a big fan of your work. I'm looking for a job. I'm looking for a good, honest man

sequence 15:

Hello, my name is John. I'm a newbie to the Internet, and I'm trying to learn how to use it.
I'm trying to

main: decoded 432 tokens in 0.71 s, speed: 609.55 t/s

llama_print_timings:        load time =     137.83 ms
llama_print_timings:      sample time =      10.18 ms /   448 runs   (    0.02 ms per token, 44025.16 tokens per second)
llama_print_timings: prompt eval time =     727.16 ms /   437 tokens (    1.66 ms per token,   600.97 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =     845.80 ms /   438 tokens

ggml_metal_free: deallocating

./llama-perplexity \
  -m ./models/mamba-130m/ggml-model-f16.gguf \
  -f build/wikitext-2-raw/wiki.test.raw -ngl 99

perplexity: tokenizing the input ..
perplexity: tokenization took 950.02 ms
perplexity: calculating perplexity over 650 chunks, n_ctx=512, batch_size=2048, n_seq=4
perplexity: 0.55 seconds per pass - ETA 1.48 minutes
...
Final estimate: PPL = 25.0894 +/- 0.18559

ggml-ci

This reverts commit fc18425.

* ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci

ggerganov changed the title ~~llama : advanced batch splits~~ ggml : add SSM Metal kernels Jul 17, 2024

github-actions bot added the testing Everything test related label Jul 17, 2024

ggerganov force-pushed the gg/metal-ssm branch from 345d590 to 7cdafd6 Compare July 18, 2024 12:32

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 18, 2024

ggerganov force-pushed the gg/metal-ssm branch from 7cdafd6 to 21c4882 Compare July 18, 2024 12:44

ggerganov marked this pull request as ready for review July 18, 2024 12:51

mofosyne added the Review Complexity : High Generally require indepth knowledge of LLMs or GPUs label Jul 19, 2024

ggerganov mentioned this pull request Aug 21, 2024

llama : simplify Mamba with advanced batch splits #8526

Merged

10 tasks

compilade mentioned this pull request Aug 21, 2024

llama : initial Mamba-2 support #9126

Open

9 tasks

ggerganov added 2 commits August 26, 2024 12:23

ggml : add ggml_ssm_conv metal impl

9928f4b

ggml : add ssm_scan metal impl

fbf2ac1

ggml-ci

ggerganov force-pushed the gg/metal-ssm branch from 21c4882 to fbf2ac1 Compare August 26, 2024 09:24

ggerganov changed the base branch from compilade/batch-splits to master August 26, 2024 09:26

ggerganov merged commit fc18425 into master Aug 26, 2024
8 checks passed

ggerganov deleted the gg/metal-ssm branch August 26, 2024 14:55

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Aug 27, 2024

Revert "ggml : add SSM Metal kernels (ggerganov#8546)"

5078aa8

This reverts commit fc18425.

piDack mentioned this pull request Aug 30, 2024

ggml:Mamba Cuda kernel performance improve #9186

Closed

4 tasks

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

ggml : add SSM Metal kernels (ggerganov#8546)

766f6a6

* ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

ggml : add SSM Metal kernels (ggerganov#8546)

9ff4df4

* ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : add SSM Metal kernels #8546

ggml : add SSM Metal kernels #8546

ggerganov commented Jul 17, 2024 •

edited

Loading

ggml : add SSM Metal kernels #8546

ggml : add SSM Metal kernels #8546

Conversation

ggerganov commented Jul 17, 2024 • edited Loading

ggerganov commented Jul 17, 2024 •

edited

Loading